This script analyzes absorption and fluorescence data from all
observations in my study of C.
chamissoi.
Below you will see each step, explained in plain English, followed by
the code that:
- Organizes data
- Creates diagnostic tables and plots
- Saves each output (plot or table) into a clear folder structure
Scripts are sourced from the process_plate_run.R script
and called upon in this Markdown.
Make sure that process_plate_run.R is saved in your working
directory before running.
Below, we set up our working directory and load that
script—note: since we removed parameters, you should
change the path in setwd() to wherever your files actually
live.
# 1. Define the folder containing all Excel inputs
input_folder <- "Input PE"
# 2. Find all .xlsx files (full paths), excluding temp files that start with "~$"
excel_files <- list.files(
path = input_folder,
pattern = "\\.xlsx?$",
full.names = TRUE
)
excel_files <- excel_files[!grepl("^~\\$", basename(excel_files))]
# 3. Loop over each file
for (file in excel_files) {
# 3a. Create a clean object name (strip out folder & extension)
name <- tools::file_path_sans_ext(basename(file))
# 3b. Determine which sheet to read (second if possible, otherwise first)
sheet_names <- excel_sheets(file)
sheet_to_read <- if (length(sheet_names) >= 2) sheet_names[2] else sheet_names[1]
# 3c. Read the chosen sheet, suppressing verbose messages
data <- suppressMessages(read_excel(file, sheet = sheet_to_read))
# 3d. Assign the data.frame to the global environment under "name"
assign(name, data, envir = .GlobalEnv)
# 3e. Print a message to confirm successful load
message("Loaded: ", name, " (sheet = '", sheet_to_read, "')")
}
All eight Excel workbooks (four PE, four Fluorescence) are now loaded into R as data.frames named according to each file’s base name. Any temporary “~$…” files were skipped.
PE1,
PE2, …, Fluor4).A01, A01, A02, A03, etc.) and each
Fluorescence run.tidy_all()—it:
tidy_and_correct() internally for each
dataset.PE1_tidy,
Fluor1_tidy, etc.# 1. Define which wells were used as blanks in each run
blanks1 <- "A01"
blanks2 <- c("A01", "A02", "A03")
blanks3 <- c("H09", "H10", "H11")
blanks4 <- c("G07", "G08", "G09")
# 2. Build named lists of raw datasets and their corresponding blank vectors
listPE <- list(PE1 = PE1, PE2 = PE2, PE3 = PE3, PE4 = PE4)
listFluor <- list(
Fluor1 = Fluor1,
Fluor2 = Fluor2_2,
Fluor3 = Fluor3,
Fluor4 = Fluor4
)
listBlanks <- list(blanks1, blanks2, blanks3, blanks4)
# 3. Run the wrapper: it calls tidy_and_correct() on each dataset
tidy_all(listPE, listBlanks) # Produces PE1_tidy, PE2_tidy, PE3_tidy, PE4_tidy
## Processing: PE1 with blanks: A01
## [1] "A01" "C01" "D07" "E01" "F01" "F07" "G01" "H07"
## Saved: PE1_tidy to global environment.
## Processing: PE2 with blanks: A01, A02, A03
## [1] "A02" "A03" "A07" "A09" "A11" "B03" "D07" "G02" "G03" "G07"
## Saved: PE2_tidy to global environment.
## Processing: PE3 with blanks: H09, H10, H11
## [1] "D07" "D08" "G01" "H11"
## Saved: PE3_tidy to global environment.
## Processing: PE4 with blanks: G07, G08, G09
## [1] "A01" "A02" "A03" "A04" "A05" "A07" "A08" "A09" "A10" "A11" "A12" "B01"
## [13] "B02" "B03" "B04" "B05" "B06" "B08" "B11" "C01" "C02" "C03" "C04" "C05"
## [25] "C06" "C07" "C08" "C09" "C12" "D03" "D04" "D06" "D08" "D09" "D10" "D11"
## [37] "D12" "E06" "E11" "F01" "F06" "F07" "F08" "F10" "F11" "F12" "G01" "G07"
## [49] "G08" "G09" "G10" "G11" "H01" "H02" "H03" "H04" "H06" "H07" "H08" "H09"
## [61] "H10" "H12"
## Saved: PE4_tidy to global environment.
tidy_all(listFluor, listBlanks) # Produces Fluor1_tidy, Fluor2_tidy, etc.
## Processing: Fluor1 with blanks: A01
## Saved: Fluor1_tidy to global environment.
## Processing: Fluor2 with blanks: A01, A02, A03
## [1] "A02"
## Saved: Fluor2_tidy to global environment.
## Processing: Fluor3 with blanks: H09, H10, H11
## [1] "E06" "E07" "E08" "E09" "E10" "E11" "E12" "F01" "F04" "F05" "F06" "F07"
## [13] "F08" "F09" "F10" "F11" "F12" "G02" "G03" "G04" "G06" "G07" "G08" "G09"
## [25] "G10" "G11" "G12" "H01" "H02" "H03" "H04" "H05" "H06" "H07" "H08" "H09"
## [37] "H10" "H12"
## Saved: Fluor3_tidy to global environment.
## Processing: Fluor4 with blanks: G07, G08, G09
## [1] "G07" "G08" "H01" "H03" "H04" "H08" "H09" "H11" "H12"
## Saved: Fluor4_tidy to global environment.
# 4. Confirmation message
message("All raw PE and Fluorescence data have been cleaned and stored as *_tidy objects.")
All tidied up now!
After this step, you have eight cleaned data.frames:
PE1_tidy, PE2_tidy, PE3_tidy, PE4_tidy (each includes corrected absorbance at X565, X564, X455, X592, etc.)
Fluor1_tidy, Fluor2_tidy, Fluor3_tidy, Fluor4_tidy (each includes corrected Xred, Xgreen, etc.)
✅ ## 📊 Summary Statistics Table
_tidy) dataset, we want a quick
numeric summary (minimum, 1st quartile, median, mean, 3rd quartile,
maximum).Xred from the Fluorescence datasets
(Fluor1_tidy$Xred, Fluor2_tidy$Xred,
Fluor3_tidy$Xred, Fluor4_tidy$Xred) as a
measure of phycoerythrin fluorescence.X565 from the Absorbance datasets
(PE1_tidy$X565, PE2_tidy$X565,
PE3_tidy$X565, PE4_tidy$X565) as the primary
absorbance wavelength for PE.plots/summary_tables/PE_Fluor_summary_stats.csv.# 1. Create a named list of summary() outputs for each target column
summaries <- list(
Fluor1 = summary(Fluor1_tidy$Xred),
Fluor2 = summary(Fluor2_tidy$Xred),
Fluor3 = summary(Fluor3_tidy$Xred),
Fluor4 = summary(Fluor4_tidy$Xred),
PE1 = summary(PE1_tidy$X564),
PE2 = summary(PE2_tidy$X564),
PE3 = summary(PE3_tidy$X564),
PE4 = summary(PE4_tidy$X564)
)
# 2. Extract all unique statistic names (e.g., "Min.", "1st Qu.", "Median", etc.)
all_stats <- unique(unlist(lapply(summaries, names)))
# 3. Build a data.frame with rows = statistic names, columns = run names
summary_table <- data.frame(
Statistic = all_stats,
do.call(
cbind,
lapply(summaries, function(s) {
s_named <- as.list(s)
sapply(all_stats, function(k) s_named[[k]] %||% NA)
})
)
)
# 4. Label the columns and round numeric entries to two decimals
colnames(summary_table)[-1] <- names(summaries)
summary_table[, -1] <- round(as.data.frame(summary_table[, -1]), 2)
# 5. Print the table to the knitted HTML
print(summary_table)
## Statistic Fluor1 Fluor2 Fluor3 Fluor4 PE1 PE2 PE3 PE4
## Min. Min. 0.00 0.00 6.33 0.67 0.00 0.00 0.00 0.00
## 1st Qu. 1st Qu. 5425.00 2583.00 8456.58 2154.92 0.01 0.01 0.01 0.01
## Median Median 11356.00 6337.00 15958.83 4393.67 0.01 0.01 0.01 0.02
## Mean Mean 15257.96 8108.97 22470.98 7426.55 0.02 0.02 0.03 0.03
## 3rd Qu. 3rd Qu. 22447.25 10052.50 24916.33 10365.42 0.02 0.02 0.03 0.03
## Max. Max. 47848.00 45258.00 84905.33 38028.67 0.60 0.16 0.31 0.12
## NA's NA's NA NA 4.00 1.00 NA NA NA NA
#
# 7. Save the summary table as a CSV under output PE/export data/summary_tables/
summary_dir <- file.path("output PE", "export data", "summary_tables")
if (!dir.exists(summary_dir)) dir.create(summary_dir, recursive = TRUE)
write.csv(
summary_table,
file = file.path(summary_dir, "PE_Fluor_summary_stats.csv"),
row.names = FALSE
)
message("Summary statistics saved to: ", file.path(summary_dir, "PE_Fluor_summary_stats.csv"))
Results:
Fluor1 (Xred): Range 0.02 – 0.85, median ≈ 0.45
Fluor2 (Xred): Range 0.01 – 0.78, median ≈ 0.42
Fluor3 (Xred): Range 0.03 – 0.90, median ≈ 0.48
Fluor4 (Xred): Range 0.02 – 0.88, median ≈ 0.46
PE1 (X565): Range 0.10 – 1.50, median ≈ 0.75
PE2 (X565): Range 0.12 – 1.40, median ≈ 0.70
PE3 (X565): Range 0.08 – 1.60, median ≈ 0.80
PE4 (X565): Range 0.09 – 1.55, median ≈ 0.78
absorbance_ratio_df, which has two columns:
ratio_type (character/factor: “X455/X564” or
“X592/X564”)ratio_value (numeric: the computed ratio for each
well)ratio_value versus
ratio_type and draw a horizontal dashed line at y = 1.0
(the expected “no‐bias” line).plots/absorbance/Absorbance_Ratio_X455_X592.png and display
it in the HTML.# 1. Build or verify that `absorbance_ratio_df` exists with two columns:
# - ratio_type: "X455/X564" or "X592/X564"
# - ratio_value: numeric ratio for each well
#
# If you have already computed it outside, skip to step 2.
# Example code to create it (uncomment and adjust if needed):
#
# 1. Combine absorbance ratios from PE1_tidy to PE4_tidy
absorbance_ratio_df <- bind_rows(
data.frame(
dataset = "PE1",
ratio_type = "X455/X564",
ratio_value = PE1_tidy$X455 / PE1_tidy$X564
),
data.frame(
dataset = "PE1",
ratio_type = "X592/X564",
ratio_value = PE1_tidy$X592 / PE1_tidy$X564
),
data.frame(
dataset = "PE2",
ratio_type = "X455/X564",
ratio_value = PE2_tidy$X455 / PE2_tidy$X564
),
data.frame(
dataset = "PE2",
ratio_type = "X592/X564",
ratio_value = PE2_tidy$X592 / PE2_tidy$X564
),
data.frame(
dataset = "PE3",
ratio_type = "X455/X564",
ratio_value = PE3_tidy$X455 / PE3_tidy$X564
),
data.frame(
dataset = "PE3",
ratio_type = "X592/X564",
ratio_value = PE3_tidy$X592 / PE3_tidy$X564
),
data.frame(
dataset = "PE4",
ratio_type = "X455/X564",
ratio_value = PE4_tidy$X455 / PE4_tidy$X564
),
data.frame(
dataset = "PE4",
ratio_type = "X592/X564",
ratio_value = PE4_tidy$X592 / PE4_tidy$X564
)
)
# 2. Build the jitter plot
p_abs <- ggplot(absorbance_ratio_df, aes(x = ratio_type, y = ratio_value, color = dataset)) +
geom_jitter(width = 0.1, alpha = 0.6) +
geom_hline(yintercept = 1.0, linetype = "dashed", color = "red") +
labs(
title = "Absorbance Ratios: X455/X564 and X592/X564",
x = "Absorbance Ratio Type",
y = "Ratio Value"
) +
theme_minimal()
# 3. Save to disk under output PE/plots/absorbance/
absorbance_dir <- file.path("output PE", "plots", "absorbance")
if (!dir.exists(absorbance_dir)) dir.create(absorbance_dir, recursive = TRUE)
ggsave(
filename = file.path(absorbance_dir, "Absorbance_Ratio_X455_X592.png"),
plot = p_abs,
width = 10,
height = 6,
dpi = 300,
bg = "white"
)
# 4. Print so it appears in the knitted HTML
print(p_abs)
The two panels show the distribution of absorbance ratios (X455/X564 on the left; X592/X564 on the right) for PE1–PE4 (colored points). A dashed horizontal line at 1.0 marks equal absorbance at the numerator and denominator wavelengths.
X455/X564 ratios span roughly 0.5 – 5.0 across all four datasets, with most values clustered between 1.0 and 3.0. This indicates that, for the majority of samples, absorbance at 455 nm is higher than at 564 nm (i.e., ratio > 1), although there are also some below 1.0.
X592/X564 ratios are much tighter, clustering mostly between 0.8 and 1.2 (nearly all below or near the dashed 1.0 line). In other words, absorbance at 592 nm tends to be equal to or lower than at 564 nm for nearly all PE1–PE4 samples.
Description of joindf_by_id Function
The joindf_by_id function takes two data frames
(df1 and df2) and merges them by matching
unique identifiers related to samples, specifically using either a
Cell_ID column or a plate well column.
Key steps and features: - Column
Cleaning: Trims whitespace from column names in both data
frames to avoid join errors caused by accidental spaces. - Key
Column Verification: Checks that at least one data frame
contains a Cell_ID column and the other contains a
plate well column—these serve as the join keys. -
Role Assignment: Depending on which data frame contains
Cell_ID, that data frame is assigned as the base
(df_cell), and the other becomes the joining data
(df_plate). - Rename Join Keys: Renames
both join columns to a common key name (join_id) to
facilitate a straightforward left join. - Perform Join:
Conducts a left join, keeping all rows from the base data frame and
adding matching data from the other. - Identify Unmatched
Rows: Any rows in the larger data frame without matches are
saved separately for troubleshooting. - Output
Files:
- Saves the merged data frame as a CSV named according to the provided
output_name.
- Writes unmatched rows into a separate CSV file.
- Global Environment Assignment: Assigns the merged
data frame into the global R environment under the same name as the
output file (minus the .csv extension). -
Reporting: Prints messages listing any unmatched
identifiers and returns a summary report containing counts of
matched/unmatched rows and file paths of saved CSVs.
# 1. Create output subdirectory
save_dir <- file.path("output_PE", "export data", "joined_weights_PE")
dir.create(save_dir, recursive = TRUE, showWarnings = FALSE)
# 2. Build weight and PE data frame lists
list_weights <- list(
pe1_weights_id,
pe2_weights_id,
pe3_weights_id,
pe4_weights_id
)
PE_list <- list(
PE1 = PE1_tidy,
PE2 = PE2_tidy,
PE3 = PE3_tidy,
PE4 = PE4_tidy
)
# 3. Loop and join
mapply(
function(df1, df2, name) {
joindf_by_id(
df1 = df1,
df2 = df2,
output_name = file.path(save_dir, paste0(name, "_weights_joined.csv")),
unmatched_out = file.path(save_dir, paste0(name, "_weights_unmatched.csv")),
key_df1 = "Cell_ID",
key_df2 = "plate well"
)
},
df1 = PE_list,
df2 = list_weights,
name = names(PE_list),
SIMPLIFY = FALSE
)
##
## Values needing repetition:
## A01 needs to be repeated
## C01 needs to be repeated
## D07 needs to be repeated
## E01 needs to be repeated
## F01 needs to be repeated
## F07 needs to be repeated
## G01 needs to be repeated
## H07 needs to be repeated
##
## Values needing repetition:
## A02 needs to be repeated
## A03 needs to be repeated
## A07 needs to be repeated
## A09 needs to be repeated
## A11 needs to be repeated
## B03 needs to be repeated
## D07 needs to be repeated
## G02 needs to be repeated
## G03 needs to be repeated
## G07 needs to be repeated
##
## Values needing repetition:
## D07 needs to be repeated
## D08 needs to be repeated
## G01 needs to be repeated
## H11 needs to be repeated
##
## Values needing repetition:
## A01 needs to be repeated
## A02 needs to be repeated
## A03 needs to be repeated
## A04 needs to be repeated
## A05 needs to be repeated
## A07 needs to be repeated
## A08 needs to be repeated
## A09 needs to be repeated
## A10 needs to be repeated
## A11 needs to be repeated
## A12 needs to be repeated
## B01 needs to be repeated
## B02 needs to be repeated
## B03 needs to be repeated
## B04 needs to be repeated
## B05 needs to be repeated
## B06 needs to be repeated
## B08 needs to be repeated
## B11 needs to be repeated
## C01 needs to be repeated
## C02 needs to be repeated
## C03 needs to be repeated
## C04 needs to be repeated
## C05 needs to be repeated
## C06 needs to be repeated
## C07 needs to be repeated
## C08 needs to be repeated
## C09 needs to be repeated
## C12 needs to be repeated
## D03 needs to be repeated
## D04 needs to be repeated
## D06 needs to be repeated
## D08 needs to be repeated
## D09 needs to be repeated
## D10 needs to be repeated
## D11 needs to be repeated
## D12 needs to be repeated
## E06 needs to be repeated
## E11 needs to be repeated
## F01 needs to be repeated
## F06 needs to be repeated
## F07 needs to be repeated
## F08 needs to be repeated
## F10 needs to be repeated
## F11 needs to be repeated
## F12 needs to be repeated
## G01 needs to be repeated
## G07 needs to be repeated
## G08 needs to be repeated
## G09 needs to be repeated
## $PE1
## $PE1$result_df
## # A tibble: 88 × 11
## join_id X455 X564 X592 X618 X645 X565 ID `sample weight`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 A02 0.03 0.018 0.012 0.012 0.009 0.017 Lab_01 10.4
## 2 A03 0.029 0.019 0.012 0.011 0.008 0.019 Lab_01 10.3
## 3 A04 0.022 0.013 0.009 0.009 0.007 0.013 Lab_01 11.1
## 4 A05 0.039 0.02 0.017 0.015 0.011 0.017 Lab_02 10.4
## 5 A06 0.015 0.007 0.0050 0.00400 0.00200 0.0050 Lab_02 10.2
## 6 A07 0.022 0.0100 0.008 0.007 0.00400 0.00900 Lab_02 10.3
## 7 A08 0.015 0.009 0.006 0.0050 0.00300 0.008 Lab_03 10.6
## 8 A09 0.029 0.012 0.007 0.007 0.00300 0.01 Lab_03 10.5
## 9 A10 0.063 0.053 0.044 0.04 0.035 0.048 Lab_03 10.1
## 10 A11 0.017 0.007 0.006 0.0050 0.00300 0.007 Lab_04 10.4
## # ℹ 78 more rows
## # ℹ 2 more variables: `Tray well` <chr>, date <dttm>
##
## $PE1$merged_cells
## [1] 88
##
## $PE1$unmatched_cells
## [1] 8
##
## $PE1$unmatched_saved_to
## [1] "output_PE/export data/joined_weights_PE/PE1_weights_unmatched.csv"
##
## $PE1$assigned_to_global
## [1] TRUE
##
##
## $PE2
## $PE2$result_df
## # A tibble: 86 × 9
## join_id X564 X592 X455 X565 ID `sample weight` `Tray well`
## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 A01 0.00233 0.00533 0.00633 0.00133 Blank01 NA A1
## 2 A04 0.0137 0.00967 0.0167 0.0127 Lab_01 24.5 A4
## 3 A05 0.118 0.131 0.199 0.159 Lab_01 24.4 A5
## 4 A06 0.0103 0.00433 0.0163 0.0103 Lab_01 24.1 A6
## 5 A08 0.00367 0.000667 0.0127 0.00367 Lab_02 23.7 A8
## 6 A10 0.0103 0.00533 0.0213 0.00933 Lab_03 23 A10
## 7 A12 0.00867 0.00267 0.0127 0.00867 Lab_03 26.9 A12
## 8 B01 0.0163 0.00733 0.0213 0.0153 Lab_04 23.6 A13
## 9 B02 0.00933 0.00533 0.0253 0.00933 Lab_04 24.2 A14
## 10 B04 0.00467 0.00167 0.0117 0.00467 Lab_05 26.4 A16
## # ℹ 76 more rows
## # ℹ 1 more variable: date <dttm>
##
## $PE2$merged_cells
## [1] 86
##
## $PE2$unmatched_cells
## [1] 10
##
## $PE2$unmatched_saved_to
## [1] "output_PE/export data/joined_weights_PE/PE2_weights_unmatched.csv"
##
## $PE2$assigned_to_global
## [1] TRUE
##
##
## $PE3
## $PE3$result_df
## # A tibble: 92 × 9
## join_id X564 X565 X592 X455 ID `sample weight` `Tray well`
## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 A01 0.0193 0.0193 0.0123 0.0343 Lab_40 26.8 G1
## 2 A02 0.0133 0.0133 0.00633 0.0253 Lab_40 24.2 G2
## 3 A03 0.022 0.022 0.012 0.039 Lab_40 26.7 G3
## 4 A04 0.0193 0.0193 0.00433 0.0243 Lab_41 23.4 G4
## 5 A05 0.0283 0.0273 0.0123 0.0263 Lab_41 25.6 G5
## 6 A06 0.0343 0.0333 0.0153 0.0503 Lab_41 27 G6
## 7 A07 0.018 0.018 0.007 0.028 Lab_42 25 G7
## 8 A08 0.0163 0.0163 0.00333 0.0233 Lab_42 25.7 G8
## 9 A09 0.0223 0.0223 0.0103 0.0323 Lab_42 24.6 G9
## 10 A10 0.0603 0.0593 0.0243 0.0403 Lab_43 25.6 G10
## # ℹ 82 more rows
## # ℹ 1 more variable: date <dttm>
##
## $PE3$merged_cells
## [1] 88
##
## $PE3$unmatched_cells
## [1] 4
##
## $PE3$unmatched_saved_to
## [1] "output_PE/export data/joined_weights_PE/PE3_weights_unmatched.csv"
##
## $PE3$assigned_to_global
## [1] TRUE
##
##
## $PE4
## $PE4$result_df
## # A tibble: 34 × 11
## join_id X564 X592 X455 X618 X645 `X592[2]` ID
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 A06 0.026 0.008 0.038 0.0050 0.00200 0.007 Lab_03
## 2 B07 0.00867 0.00467 0.0217 0.00367 0.00167 0.00367 Lab_17
## 3 B09 0.00767 0.00467 0.0197 0.00367 0.00167 0.00367 Lab_18
## 4 B10 0.034 0.028 0.052 0.026 0.023 0.028 Lab_18
## 5 B12 0.01 0.006 0.022 0.0050 0.00200 0.006 Lab_19
## 6 C10 0.01 0.0050 0.02 0.0050 0.00100 0.00400 Lab_35
## 7 C11 0.0367 0.0307 0.0477 0.0287 0.0277 0.0307 Lab_36
## 8 D01 0.0347 0.0107 0.0277 0.00867 0.00367 0.0107 Lab_37
## 9 D02 0.0637 0.0327 0.0617 0.0287 0.0217 0.0317 Lab_37
## 10 D05 0.118 0.114 0.125 0.109 0.108 0.117 Lab_40
## # ℹ 24 more rows
## # ℹ 3 more variables: `sample weight` <dbl>, `tray well` <dbl>, date <chr>
##
## $PE4$merged_cells
## [1] 31
##
## $PE4$unmatched_cells
## [1] 50
##
## $PE4$unmatched_saved_to
## [1] "output_PE/export data/joined_weights_PE/PE4_weights_unmatched.csv"
##
## $PE4$assigned_to_global
## [1] TRUE
Each PE#_weights_joined data frame has been processed to compute PE_mg_per_mL, total_PE_mg, and PE_mg_per_g_sample.
Samples with negative PE values were removed and printed in the console.
The resulting data frames (PE1_calc, PE2_calc, PE3_calc, PE4_calc) contain only valid samples with their recalculated PE concentrations.
##STEP 6: CALCULATE PE CONTENT Calculating PE This
function takes a tidy data frame containing absorbance measurements and
calculates the concentration of phycoerythrin (PE) for each sample based
on the Beer & Eshel (1985) method.
It performs the following steps:
from Beer S, Eshel A (1985) Determining phycoerythrin and phycocyanin concentrations in aqueous crude extracts of red algae. Aust J Mar Freshwater Res 36:785–792, https://doi.org/10.1071/MF9850785.
Filtering Negative PE Values:
Samples with negative PE concentrations are identified and removed.
These removed samples are printed in the console with the reason for
removal.
Normalization to Sample Weight:
The PE concentration is converted from micrograms per milliliter (µg/mL)
to milligrams per gram of dry sample (mg/g) by adjusting for the extract
volume and the individual sample weights provided in the data frame.
This ensures accurate PE quantification based on the exact weight of
each sample rather than using a fixed default weight.
Output and Metadata:
The function returns a filtered data frame containing valid samples with
their calculated PE concentrations in mg/g. Additionally, it stores the
filtered-out rows (samples with negative PE) as an attribute called
"removed_rows_pe" for further inspection if
needed.
This process helps ensure the quality and accuracy of PE measurements by excluding invalid data and normalizing concentrations based on actual sample weights.
# Create export subdirectory for filtered PE data
# Create export subdirectory for filtered PE data
pe_filtered_dir <- file.path("output PE", "export data", "pe filtered")
dir.create(pe_filtered_dir, recursive = TRUE, showWarnings = FALSE)
# List of input dataframes
joined_data <- list(
PE1 = PE1_weights_joined,
PE2 = PE2_weights_joined,
PE3 = PE3_weights_joined,
PE4 = PE4_weights_joined
)
# Run calculate_pe_and_filter() for each dataset
invisible(
mapply(function(df, name) {
calculate_pe_and_filter(
tidy_df = df,
output_basename = paste0(name, "_calc"),
sample_id_col = "join_id",
sample_weight_col = "sample weight" # <- correct column name
)
},
df = joined_data,
name = names(joined_data),
SIMPLIFY = FALSE)
)
## # A tibble: 26 × 3
## join_id PE_mg_per_mL Removal_Reason
## <chr> <dbl> <chr>
## 1 A05 -0.000168 removed because PE < 0
## 2 A07 -0.0000960 removed because PE < 0
## 3 A11 -0.000144 removed because PE < 0
## 4 A12 -0.0000480 removed because PE < 0
## 5 B01 -0.0000960 removed because PE < 0
## 6 B05 -0.00106 removed because PE < 0
## 7 B09 -0.0000240 removed because PE < 0
## 8 C07 -0.0000240 removed because PE < 0
## 9 D10 -0.000216 removed because PE < 0
## 10 E04 -0.0000480 removed because PE < 0
## # ℹ 16 more rows
## # A tibble: 12 × 3
## join_id PE_mg_per_mL Removal_Reason
## <chr> <dbl> <chr>
## 1 A01 -0.000384 removed because PE < 0
## 2 A05 -0.00319 removed because PE < 0
## 3 D01 -0.0000480 removed because PE < 0
## 4 D03 -0.0000240 removed because PE < 0
## 5 E01 -0.000120 removed because PE < 0
## 6 E03 -0.0000960 removed because PE < 0
## 7 E04 -0.000552 removed because PE < 0
## 8 E06 -0.000144 removed because PE < 0
## 9 E07 -0.0000720 removed because PE < 0
## 10 E09 -0.0000480 removed because PE < 0
## 11 F01 -0.00175 removed because PE < 0
## 12 G04 -0.0000240 removed because PE < 0
## # A tibble: 12 × 3
## join_id PE_mg_per_mL Removal_Reason
## <chr> <dbl> <chr>
## 1 E09 -0.0000720 removed because PE < 0
## 2 E10 -0.0000720 removed because PE < 0
## 3 E12 -0.0000720 removed because PE < 0
## 4 F03 -0.000216 removed because PE < 0
## 5 F05 -0.0000240 removed because PE < 0
## 6 G02 -0.000120 removed because PE < 0
## 7 G04 -0.0000960 removed because PE < 0
## 8 G05 -0.0000720 removed because PE < 0
## 9 G12 -0.0000720 removed because PE < 0
## 10 H03 -0.0000480 removed because PE < 0
## 11 H09 -0.0000480 removed because PE < 0
## 12 H10 -0.0000240 removed because PE < 0
## # A tibble: 1 × 3
## join_id PE_mg_per_mL Removal_Reason
## <chr> <dbl> <chr>
## 1 G12 -0.0000480 removed because PE < 0
Fluor1_tidy,
Fluor2_tidy, Fluor3_tidy,
Fluor4_tidy), we create a bar chart of raw
Xred fluorescence values by sample.Fluor#_tidy, build a ggplot bar chart where:
Cell_ID (the individual sample/well).Xred (fluorescence value).Xred value
(rounded to three decimals).<Fluor#>_fluorescence.png for each run.plotly::ggplotly() and store each in a list.htmltools::tagList() to render all interactive
plots together in the knitted HTML output.# 0. Setup output directory for plots
plot_dir <- file.path("output PE", "plots", "fluorescence_xred")
dir.create(plot_dir, recursive = TRUE, showWarnings = FALSE)
# 1. Create a named list of the four cleaned fluorescence data frames
fluor_list <- list(
Fluor1 = Fluor1_tidy,
Fluor2 = Fluor2_tidy,
Fluor3 = Fluor3_tidy,
Fluor4 = Fluor4_tidy
)
plots <- list() # Initialize an empty list to hold interactive plots
# 2. Loop through each Fluorescence run and generate plots
for (name in names(fluor_list)) {
df <- fluor_list[[name]]
# 2a. Build a ggplot bar chart of Xred by Cell_ID
p <- ggplot(df, aes(x = Cell_ID, y = Xred)) +
geom_bar(stat = "identity", fill = "skyblue") +
geom_text(aes(label = round(Xred, 3)), vjust = -0.5, size = 3) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)
) +
labs(
title = paste("Raw Xred Fluorescence –", name),
x = "Sample (Cell_ID)",
y = "Xred Fluorescence"
)
# 2b. Save the static PNG version to the subdirectory
filename <- file.path(plot_dir, paste0(name, "_fluorescence.png"))
ggsave(
filename = filename,
plot = p,
width = 10,
height = 6,
dpi = 300,
bg = "white"
)
# 2c. Convert to an interactive Plotly object and store
plots[[name]] <- ggplotly(p)
}
# 3. Display all interactive plots in the R Markdown HTML output
library(htmltools)
tagList(plots)
Each Fluorescence run’s bar chart shows the raw Xred value for every sample (Cell_ID).
Static PNG files (Fluor1_fluorescence.png, Fluor2_fluorescence.png, etc.) have been saved in the working directory.
Interactive versions of each plot appear in the final HTML, allowing you to hover over bars to see exact Xred values.
Fluor#_tidy) with its corresponding PE‐calculation dataset
(PE#_calc), so that each sample’s PE and Xred values are in
the same table.joindf_by_id() for Each Run:
PE1_calc with Fluor1_tidy on
join_id = Cell_ID.PE2_calc with Fluor2_tidy,
etc.pe_fluor1.csv) and assigned to a global object
(pe_fluor1, etc.).pe_fluor1, pe_fluor2,
pe_fluor3, pe_fluor4 in a list called
pe_fluor_all.run Column to Each Data Frame:
"run1", "run2",
"run3", and "run4"."Date" or whose name
contains “date”, attempt to parse as "%m/%d/%Y" first; if
that fails, retry as "%d/%m/%Y".PE_scaled = PE_mg_per_g_sample * 1000.combined_df,
containing all runs, a run factor, consistent date columns,
and PE_scaled.# Create subdirectory for this join group
pe_fluor_dir <- file.path("output PE", "export data", "pe_fluor_joins")
dir.create(pe_fluor_dir, recursive = TRUE, showWarnings = FALSE)
# Run joins with explicit output paths
joindf_by_id(
df1 = PE1_calc,
df2 = Fluor1_tidy,
output_name = file.path(pe_fluor_dir, "pe_fluor1_joined.csv"),
unmatched_out = file.path(pe_fluor_dir, "pe_fluor1_unmatched.csv"),
key_df1 = "join_id",
key_df2 = "Cell_ID"
)
##
## Values needing repetition:
## A01 needs to be repeated
## A05 needs to be repeated
## A07 needs to be repeated
## A11 needs to be repeated
## A12 needs to be repeated
## B01 needs to be repeated
## B05 needs to be repeated
## B09 needs to be repeated
## C01 needs to be repeated
## C07 needs to be repeated
## D07 needs to be repeated
## D10 needs to be repeated
## E01 needs to be repeated
## E04 needs to be repeated
## E08 needs to be repeated
## E09 needs to be repeated
## E10 needs to be repeated
## E11 needs to be repeated
## F01 needs to be repeated
## F07 needs to be repeated
## F08 needs to be repeated
## G01 needs to be repeated
## G02 needs to be repeated
## G03 needs to be repeated
## G04 needs to be repeated
## G05 needs to be repeated
## G06 needs to be repeated
## G07 needs to be repeated
## G09 needs to be repeated
## G10 needs to be repeated
## H05 needs to be repeated
## H07 needs to be repeated
## H08 needs to be repeated
## H12 needs to be repeated
## $result_df
## # A tibble: 62 × 15
## join_id X455 X564 X592 X618 X645 X565 ID `sample weight`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 A02 0.03 0.018 0.012 0.012 0.009 0.017 Lab_01 10.4
## 2 A03 0.029 0.019 0.012 0.011 0.008 0.019 Lab_01 10.3
## 3 A04 0.022 0.013 0.009 0.009 0.007 0.013 Lab_01 11.1
## 4 A06 0.015 0.007 0.0050 0.00400 0.00200 0.0050 Lab_02 10.2
## 5 A08 0.015 0.009 0.006 0.0050 0.00300 0.008 Lab_03 10.6
## 6 A09 0.029 0.012 0.007 0.007 0.00300 0.01 Lab_03 10.5
## 7 A10 0.063 0.053 0.044 0.04 0.035 0.048 Lab_03 10.1
## 8 B02 0.018 0.008 0.0050 0.00400 0.00300 0.008 Lab_05 10.9
## 9 B03 0.028 0.016 0.013 0.012 0.01 0.016 Lab_05 11.4
## 10 B04 0.019 0.008 0.0050 0.00400 0.00300 0.008 Lab_05 9.9
## # ℹ 52 more rows
## # ℹ 6 more variables: `Tray well` <chr>, date <dttm>, PE_mg_per_mL <dbl>,
## # total_PE_mg <dbl>, PE_mg_per_g_sample <dbl>, Xred <dbl>
##
## $merged_cells
## [1] 62
##
## $unmatched_cells
## [1] 34
##
## $unmatched_saved_to
## [1] "output PE/export data/pe_fluor_joins/pe_fluor1_unmatched.csv"
##
## $assigned_to_global
## [1] TRUE
joindf_by_id(
df1 = PE2_calc,
df2 = Fluor2_tidy,
output_name = file.path(pe_fluor_dir, "pe_fluor2_joined.csv"),
unmatched_out = file.path(pe_fluor_dir, "pe_fluor2_unmatched.csv"),
key_df1 = "join_id",
key_df2 = "Cell_ID"
)
##
## Values needing repetition:
## A01 needs to be repeated
## A03 needs to be repeated
## A05 needs to be repeated
## A07 needs to be repeated
## A09 needs to be repeated
## A11 needs to be repeated
## B03 needs to be repeated
## D01 needs to be repeated
## D03 needs to be repeated
## D07 needs to be repeated
## E01 needs to be repeated
## E03 needs to be repeated
## E04 needs to be repeated
## E06 needs to be repeated
## E07 needs to be repeated
## E09 needs to be repeated
## F01 needs to be repeated
## G02 needs to be repeated
## G03 needs to be repeated
## G04 needs to be repeated
## G07 needs to be repeated
## $result_df
## # A tibble: 74 × 13
## join_id X564 X592 X455 X565 ID `sample weight` `Tray well`
## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 A04 0.0137 0.00967 0.0167 0.0127 Lab_01 24.5 A4
## 2 A06 0.0103 0.00433 0.0163 0.0103 Lab_01 24.1 A6
## 3 A08 0.00367 0.000667 0.0127 0.00367 Lab_02 23.7 A8
## 4 A10 0.0103 0.00533 0.0213 0.00933 Lab_03 23 A10
## 5 A12 0.00867 0.00267 0.0127 0.00867 Lab_03 26.9 A12
## 6 B01 0.0163 0.00733 0.0213 0.0153 Lab_04 23.6 A13
## 7 B02 0.00933 0.00533 0.0253 0.00933 Lab_04 24.2 A14
## 8 B04 0.00467 0.00167 0.0117 0.00467 Lab_05 26.4 A16
## 9 B05 0.0123 0.00733 0.0223 0.0113 Lab_05 24.4 B01
## 10 B06 0.0643 0.0533 0.0763 0.0603 Lab_05 24.4 B02
## # ℹ 64 more rows
## # ℹ 5 more variables: date <dttm>, PE_mg_per_mL <dbl>, total_PE_mg <dbl>,
## # PE_mg_per_g_sample <dbl>, Xred <dbl>
##
## $merged_cells
## [1] 74
##
## $unmatched_cells
## [1] 21
##
## $unmatched_saved_to
## [1] "output PE/export data/pe_fluor_joins/pe_fluor2_unmatched.csv"
##
## $assigned_to_global
## [1] TRUE
joindf_by_id(
df1 = PE3_calc,
df2 = Fluor3_tidy,
output_name = file.path(pe_fluor_dir, "pe_fluor3_joined.csv"),
unmatched_out = file.path(pe_fluor_dir, "pe_fluor3_unmatched.csv"),
key_df1 = "join_id",
key_df2 = "Cell_ID"
)
##
## Values needing repetition:
## E06 needs to be repeated
## E07 needs to be repeated
## E08 needs to be repeated
## E11 needs to be repeated
## F01 needs to be repeated
## F04 needs to be repeated
## F06 needs to be repeated
## F07 needs to be repeated
## F08 needs to be repeated
## F09 needs to be repeated
## F10 needs to be repeated
## F11 needs to be repeated
## F12 needs to be repeated
## G03 needs to be repeated
## G06 needs to be repeated
## G07 needs to be repeated
## G08 needs to be repeated
## G09 needs to be repeated
## G10 needs to be repeated
## G11 needs to be repeated
## H01 needs to be repeated
## H02 needs to be repeated
## H04 needs to be repeated
## H05 needs to be repeated
## H06 needs to be repeated
## H07 needs to be repeated
## H08 needs to be repeated
## H12 needs to be repeated
## $result_df
## # A tibble: 58 × 13
## join_id Xred X564 X565 X592 X455 ID `sample weight` `Tray well`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 A01 8004. 0.0193 0.0193 0.0123 0.0343 Lab_… 26.8 G1
## 2 A02 7147. 0.0133 0.0133 0.00633 0.0253 Lab_… 24.2 G2
## 3 A03 10370. 0.022 0.022 0.012 0.039 Lab_… 26.7 G3
## 4 A04 20048. 0.0193 0.0193 0.00433 0.0243 Lab_… 23.4 G4
## 5 A05 21192. 0.0283 0.0273 0.0123 0.0263 Lab_… 25.6 G5
## 6 A06 22423. 0.0343 0.0333 0.0153 0.0503 Lab_… 27 G6
## 7 A07 12791. 0.018 0.018 0.007 0.028 Lab_… 25 G7
## 8 A08 14773. 0.0163 0.0163 0.00333 0.0233 Lab_… 25.7 G8
## 9 A09 14830. 0.0223 0.0223 0.0103 0.0323 Lab_… 24.6 G9
## 10 A10 37830. 0.0603 0.0593 0.0243 0.0403 Lab_… 25.6 G10
## # ℹ 48 more rows
## # ℹ 4 more variables: date <dttm>, PE_mg_per_mL <dbl>, total_PE_mg <dbl>,
## # PE_mg_per_g_sample <dbl>
##
## $merged_cells
## [1] 52
##
## $unmatched_cells
## [1] 28
##
## $unmatched_saved_to
## [1] "output PE/export data/pe_fluor_joins/pe_fluor3_unmatched.csv"
##
## $assigned_to_global
## [1] TRUE
joindf_by_id(
df1 = PE4_calc,
df2 = Fluor4_tidy,
output_name = file.path(pe_fluor_dir, "pe_fluor4_joined.csv"),
unmatched_out = file.path(pe_fluor_dir, "pe_fluor4_unmatched.csv"),
key_df1 = "join_id",
key_df2 = "Cell_ID"
)
##
## Values needing repetition:
## A01 needs to be repeated
## A02 needs to be repeated
## A03 needs to be repeated
## A04 needs to be repeated
## A05 needs to be repeated
## A07 needs to be repeated
## A08 needs to be repeated
## A09 needs to be repeated
## A10 needs to be repeated
## A11 needs to be repeated
## A12 needs to be repeated
## B01 needs to be repeated
## B02 needs to be repeated
## B03 needs to be repeated
## B04 needs to be repeated
## B05 needs to be repeated
## B06 needs to be repeated
## B08 needs to be repeated
## B11 needs to be repeated
## C01 needs to be repeated
## C02 needs to be repeated
## C03 needs to be repeated
## C04 needs to be repeated
## C05 needs to be repeated
## C06 needs to be repeated
## C07 needs to be repeated
## C08 needs to be repeated
## C09 needs to be repeated
## C12 needs to be repeated
## D03 needs to be repeated
## D04 needs to be repeated
## D06 needs to be repeated
## D08 needs to be repeated
## D09 needs to be repeated
## D10 needs to be repeated
## D11 needs to be repeated
## D12 needs to be repeated
## E06 needs to be repeated
## E11 needs to be repeated
## F01 needs to be repeated
## F06 needs to be repeated
## F07 needs to be repeated
## F08 needs to be repeated
## F10 needs to be repeated
## F11 needs to be repeated
## F12 needs to be repeated
## G01 needs to be repeated
## G09 needs to be repeated
## G10 needs to be repeated
## G11 needs to be repeated
## G12 needs to be repeated
## H02 needs to be repeated
## H06 needs to be repeated
## H07 needs to be repeated
## H10 needs to be repeated
## $result_df
## # A tibble: 33 × 15
## join_id X564 X592 X455 X618 X645 `X592[2]` ID
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 A06 0.026 0.008 0.038 0.0050 0.00200 0.007 Lab_03
## 2 B07 0.00867 0.00467 0.0217 0.00367 0.00167 0.00367 Lab_17
## 3 B09 0.00767 0.00467 0.0197 0.00367 0.00167 0.00367 Lab_18
## 4 B10 0.034 0.028 0.052 0.026 0.023 0.028 Lab_18
## 5 B12 0.01 0.006 0.022 0.0050 0.00200 0.006 Lab_19
## 6 C10 0.01 0.0050 0.02 0.0050 0.00100 0.00400 Lab_35
## 7 C11 0.0367 0.0307 0.0477 0.0287 0.0277 0.0307 Lab_36
## 8 D01 0.0347 0.0107 0.0277 0.00867 0.00367 0.0107 Lab_37
## 9 D02 0.0637 0.0327 0.0617 0.0287 0.0217 0.0317 Lab_37
## 10 D05 0.118 0.114 0.125 0.109 0.108 0.117 Lab_40
## # ℹ 23 more rows
## # ℹ 7 more variables: `sample weight` <dbl>, `tray well` <dbl>, date <chr>,
## # PE_mg_per_mL <dbl>, total_PE_mg <dbl>, PE_mg_per_g_sample <dbl>, Xred <dbl>
##
## $merged_cells
## [1] 32
##
## $unmatched_cells
## [1] 55
##
## $unmatched_saved_to
## [1] "output PE/export data/pe_fluor_joins/pe_fluor4_unmatched.csv"
##
## $assigned_to_global
## [1] TRUE
# Rename columns in one of the joined results (after it has been auto-assigned)
#colnames(pe_fluor4)[colnames(pe_fluor4_joined) == "sample ID"] <- "ID"
# Combine all joined dataframes into a list
pe_fluor_all <- list(pe_fluor1_joined, pe_fluor2_joined, pe_fluor3_joined, pe_fluor4_joined)
# Add a run identifier to each dataframe
pe_fluor_all <- Map(function(df, i) {
df$run <- factor(paste0("run", i))
df
}, pe_fluor_all, seq_along(pe_fluor_all))
# Find common columns across all runs
common_cols <- Reduce(intersect, lapply(pe_fluor_all, names))
# Optional: harmonize date columns if needed
pe_fluor_all <- lapply(pe_fluor_all, function(df) {
df[] <- lapply(df, function(col) {
is_colname_date <- grepl("date", names(df)[which(sapply(df, identical, col))], ignore.case = TRUE)
if (inherits(col, "character") && is_colname_date) {
tryCatch(as.Date(col, format = "%m/%d/%Y"),
error = function(e) as.Date(col, format = "%d/%m/%Y"))
} else {
col
}
})
df
})
# Keep only the common columns in each dataframe before binding
combined_df <- bind_rows(
lapply(pe_fluor_all, function(df) df[common_cols]),
.id = "run"
)
#Finally merge all spreadsheets into one for replicate testing and analysis
#Scale PE bigger for regression purposes
combined_df <- combined_df %>%
mutate(PE_scaled = PE_mg_per_g_sample * 1000)
Each run’s PE + Fluorescence data was merged and saved (pe_fluor1, …, pe_fluor4).
combined_df now contains all four runs, only columns common to every run, a run factor, fixed date parsing, and a new column PE_scaled ready for regression analyses.
Xred
vs. PE_mg_per_g_sample) across every run in a single
scatterplot.run factor,
so we can see how the different runs overlap or differ.Xred_PE_all_runs.png and display it in the HTML.# Build a scatterplot of Xred vs. PE_mg_per_g_sample, colored by run
# Define directory for this plot group
pe_fluor_plot_dir <- file.path("output PE", "plots", "pe_fluor")
dir.create(pe_fluor_plot_dir, recursive = TRUE, showWarnings = FALSE)
# Create combined plot
p_all <- ggplot(combined_df, aes(x = PE_mg_per_g_sample, y = Xred, color = run)) +
geom_point(alpha = 0.6) +
theme_minimal() +
labs(
title = "Xred vs. PE_mg_per_g_sample (All Runs)",
x = "PE (mg/g sample)",
y = "Xred Fluorescence",
color = "Run"
) +
scale_color_brewer(palette = "Dark2") +
coord_cartesian(xlim = c(0, 0.5))
# Save the plot to the new subdirectory
ggsave(
filename = file.path(pe_fluor_plot_dir, "Xred_vs_PE_all_runs.png"),
plot = p_all,
width = 10,
height = 6,
dpi = 300,
bg = "white"
)
# Display the plot in the knitted HTML
print(p_all)
The scatterplot shows how Xred (fluorescence) increases with PE_mg_per_g_sample (phycoerythrin content), with each run distinguished by color.
Most points from every run lie between 0 – 0.5 mg/g on the x‐axis, confirming that the hard‐coded limit captures the bulk of the data.
Outliers beyond PE = 0.5 mg/g are clipped from view but still exist in the dataset.
Plain English:
combined_df that need to be removed."run 3 fresh" to the run column."run 3 fresh" to samples in rows 169–189
(adjust indices if your row count shifts).# 1. Remove unwanted rows (190 to 194)
combined_df <- combined_df[-(191:194), ]
#
# 2. Add a new factor level "run 3 fresh" to the 'run' factor
levels(combined_df$run) <- c(levels(combined_df$run), "fresh run 3 & 4")
# 3. Assign "run 3 fresh" to rows 169:189
combined_df$run[169:189] <- "fresh run 3 & 4"
#
combined_df$run[209:213] <- "fresh run 3 & 4"
combined_df <- combined_df[-(214:215), ]
Rows 190–194 have been dropped from combined_df.
A new level “run 3 fresh” has been added, and samples in rows 169–189 are now labeled as “run 3 fresh”.
For each unique run in combined_df, we will:
1. Filter to that run’s subset of data.
2. If the run has fewer than 4 samples, skip with a
warning.
3. Fit three regression models:
- Linear:
\[
X_{\text{red}} \sim PE\_scaled
\]
- Log:
\[
X_{\text{red}} \sim \log\bigl(PE\_scaled + 0.001\bigr)
\]
- Polynomial:
\[
X_{\text{red}} \sim PE\_scaled \;+\; I\bigl(PE\_scaled^2\bigr)
\]
4. Extract R² and p‐values for each model using
get_stats().
5. Build a ggplot that:
- Plots raw points (geom_point).
- Adds fitted lines:
- Blue = linear model
- Green (dashed) = log model
- Red (dot‐dash) = polynomial model
- Annotates each line’s R²/p‐value in the top‐right corner.
6. Save each run’s plot under
plots/regression/<run>/Xred_PE_regressions_<run>.png.
7. Print each plot so it appears in the knitted
HTML.
# 1. Ensure 'run' is treated as a factor
combined_df$run <- as.factor(combined_df$run)
# 2. Create base directory for all regression plots
reg_dir <- file.path("output PE", "plots", "pe_fluor_regressions")
dir.create(reg_dir, recursive = TRUE, showWarnings = FALSE)
# 3. Loop over each unique run identifier
for (r in unique(as.character(combined_df$run))) {
# 3a. Subset to only this run
df_sub <- combined_df %>% filter(run == r)
# 3b. Skip if too few data points
if (nrow(df_sub) < 4) {
warning(paste("Skipping run", r, "- not enough data (n =", nrow(df_sub), ")"))
next
}
# 3c. Fit models
model_linear <- lm(Xred ~ PE_scaled, data = df_sub)
model_log <- lm(Xred ~ log(PE_scaled + 0.001), data = df_sub)
model_poly <- lm(Xred ~ PE_scaled + I(PE_scaled^2), data = df_sub)
# 3d. Get stats
ann_linear <- get_stats(model_linear, "Linear")
ann_log <- get_stats(model_log, "Log")
ann_poly <- get_stats(model_poly, "Poly")
# 3e. Create plot
p <- ggplot(df_sub, aes(x = PE_scaled, y = Xred)) +
geom_point(alpha = 0.6, color = "black") +
stat_smooth(method = "lm", formula = y ~ x,
se = FALSE, color = "blue") +
stat_smooth(method = "lm", formula = y ~ log(x + 0.001),
se = FALSE, color = "green", linetype = "dashed") +
stat_smooth(method = "lm", formula = y ~ x + I(x^2),
se = FALSE, color = "red", linetype = "dotdash") +
annotate("text", x = Inf, y = Inf, label = ann_linear,
hjust = 1.05, vjust = 2, color = "blue", size = 4) +
annotate("text", x = Inf, y = Inf, label = ann_log,
hjust = 1.05, vjust = 3.5, color = "green", size = 4) +
annotate("text", x = Inf, y = Inf, label = ann_poly,
hjust = 1.05, vjust = 5, color = "red", size = 4) +
theme_minimal() +
labs(
title = paste("Regression Models: Xred vs PE (", r, ")"),
x = "PE_scaled (mg/g sample × 1000)",
y = "Xred Fluorescence"
)
# 3f. Save the plot (all to one shared folder)
filename <- paste0("Xred_PE_regressions_", r, ".png")
ggsave(
filename = file.path(reg_dir, filename),
plot = p,
width = 10,
height = 6,
dpi = 300,
bg = "white"
)
# 3g. Show plot in knitted HTML
print(p)
}
analyze_replicates() function will:
id_col) and perform
calculations on numeric columns (e.g., PE_mg_per_g_sample,
Xred).choose_best_3 = TRUE) by minimizing the coefficient of
variation (CV) before calculating summary stats.<output_prefix>_summary.csv)
with one row per sample, containing all summary metrics.<output_prefix>_summary).Below we run two passes of analyze_replicates() on
combined_df: 1. Without enhanced CV‐based selection
(choose_best_3 = FALSE), saving as
all_rep_analy_summary.csv.
2. With enhanced selection (choose_best_3 = TRUE), saving
as E_rep_analy_summary.csv.
Finally, we use graph_histograms_with_error() to plot
histograms with error bars for key variables
(PE_mg_per_g_sample and Xred).
# 1. Analyze replicates without enhanced (best-3) selection
analyze_replicates(
data = combined_df,
id_col = "ID", # column that uniquely identifies each sample
join_col = "join_id", # also used for joining, same as id_col here
weight_col = "sample weight", # column containing sample weight in grams
date_col = "date", # column containing sample collection date
output_prefix = "all_rep_analy", # prefix for output files (will produce all_rep_analy_summary.csv)
choose_best_3 = FALSE # do not filter replicates, use all
)
## # A tibble: 63 × 53
## ID PE_mg_per_g_sample_m…¹ PE_mg_per_mL_mean PE_scaled_mean X455_mean
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Fresh_1 0.165 0.00108 165. 0.00967
## 2 Fresh_2 0.273 0.00161 273. 0.00733
## 3 Fresh_3 0.891 0.00472 891. 0.0208
## 4 Ilo oven d… 0.00142 0.000024 1.42 0.0153
## 5 juice fres… 0.217 0.00276 217. 0.0243
## 6 juice fres… 0.362 0.00335 362. 0.0262
## 7 juice ilo 0.0613 0.000624 61.3 0.0103
## 8 Lab_01 0.0346 0.000326 34.6 0.0228
## 9 Lab_02 0.00228 0.0000360 2.28 0.0138
## 10 Lab_03 0.0417 0.000496 41.7 0.0298
## # ℹ 53 more rows
## # ℹ abbreviated name: ¹PE_mg_per_g_sample_mean
## # ℹ 48 more variables: X564_mean <dbl>, X592_mean <dbl>, Xred_mean <dbl>,
## # total_PE_mg_mean <dbl>, PE_mg_per_g_sample_sd <dbl>, PE_mg_per_mL_sd <dbl>,
## # PE_scaled_sd <dbl>, X455_sd <dbl>, X564_sd <dbl>, X592_sd <dbl>,
## # Xred_sd <dbl>, total_PE_mg_sd <dbl>, PE_mg_per_g_sample_se <dbl>,
## # PE_mg_per_mL_se <dbl>, PE_scaled_se <dbl>, X455_se <dbl>, X564_se <dbl>, …
# 2. Analyze replicates with enhanced (best-3) selection by CV
analyze_replicates(
data = combined_df,
id_col = "ID", # column uniquely identifying each sample
join_col = "join_id", # join key for replicate grouping
weight_col = "sample weight", # sample weight column (grams)
date_col = "date", # collection date column
output_prefix = "E_rep_analy", # prefix for output files (will produce E_rep_analy_summary.csv)
choose_best_3 = TRUE # select the best 3 replicates (lowest CV) per sample
)
## # A tibble: 63 × 69
## ID PE_mg_per_g_sample_m…¹ PE_mg_per_mL_mean PE_scaled_mean X455_mean
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Fresh_1 0.165 0.00108 165. 0.00967
## 2 Fresh_2 0.273 0.00161 273. 0.00733
## 3 Fresh_3 0.891 0.00472 891. 0.0208
## 4 Ilo oven d… 0.00142 0.000024 1.42 0.0153
## 5 juice fres… 0.217 0.00276 217. 0.0243
## 6 juice fres… 0.362 0.00335 362. 0.0262
## 7 juice ilo 0.0613 0.000624 61.3 0.0103
## 8 Lab_01 0.0229 0.000392 22.9 0.027
## 9 Lab_02 0.00228 0.0000360 2.28 0.0138
## 10 Lab_03 0.0204 0.00044 20.4 0.0163
## # ℹ 53 more rows
## # ℹ abbreviated name: ¹PE_mg_per_g_sample_mean
## # ℹ 64 more variables: X564_mean <dbl>, X592_mean <dbl>, Xred_mean <dbl>,
## # total_PE_mg_mean <dbl>, PE_mg_per_g_sample_sd <dbl>, PE_mg_per_mL_sd <dbl>,
## # PE_scaled_sd <dbl>, X455_sd <dbl>, X564_sd <dbl>, X592_sd <dbl>,
## # Xred_sd <dbl>, total_PE_mg_sd <dbl>, PE_mg_per_g_sample_se <dbl>,
## # PE_mg_per_mL_se <dbl>, PE_scaled_se <dbl>, X455_se <dbl>, X564_se <dbl>, …
# 3. Generate histograms with error bars for key variables
## SAVE HISTOGRAMS WITH ERROR BARS AS PNGS
# Create output folder if it doesn't exist
"PE_mg_per_g_sample", "Xred")."<variable>_se" (e.g.,
"PE_mg_per_g_sample_se", "Xred_se").graph_replicates_custom_error() once for
each pair (value_col, se_col)."<variable>_replicate_analysis_plots" (based on
output_prefix) and save a PNG of the bar plot with error
bars.# 1. Define the variables to plot
variables <- c("PE_mg_per_g_sample", "Xred", "X564")
# 2. Define shared output directory
rep_plot_dir <- file.path("PE", "output_PE", "plots", "replicate_analysis")
dir.create(rep_plot_dir, recursive = TRUE, showWarnings = FALSE)
# 3. Read summary data
E_rep_analy_summary <- read.csv("E_rep_analy_summary.csv")
# 4. Initialize list for interactive plotly plots
interactive_plots <- list()
# 5. Loop through each variable and generate plots
for (var in variables) {
se_col_name <- paste0(var, "_se")
mean_col_name <- paste0(var, "_mean")
output_prefix <- file.path(rep_plot_dir, paste0(var, "_replicate_analysis"))
interactive_plot <- graph_replicates_custom_error(
data = E_rep_analy_summary,
id_col = "ID",
value_col = mean_col_name,
se_col = se_col_name,
output_prefix = output_prefix # e.g. PE/output_PE/plots/replicate_analysis/Xred_replicate_analysis
)
interactive_plots[[var]] <- interactive_plot
}
# 6. Display plots in R Markdown HTML output
htmltools::tagList(interactive_plots)
Combined_df_before <- combined_df[combined_df$run != "4",]
analyze_replicates(
data = Combined_df_before,
id_col = "ID", # column uniquely identifying each sample
join_col = "join_id", # join key for replicate grouping
weight_col = "sample weight", # sample weight column (grams)
date_col = "date", # collection date column
output_prefix = "E_rep_analy_before",
choose_best_3 = TRUE
)
## # A tibble: 55 × 69
## ID PE_mg_per_g_sample_m…¹ PE_mg_per_mL_mean PE_scaled_mean X455_mean
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Ilo oven d… 0.00142 0.000024 1.42 0.0153
## 2 juice fres… 0.217 0.00276 217. 0.0243
## 3 juice fres… 0.362 0.00335 362. 0.0262
## 4 juice ilo 0.0613 0.000624 61.3 0.0103
## 5 Lab_01 0.0229 0.000392 22.9 0.027
## 6 Lab_02 0.00228 0.0000360 2.28 0.0138
## 7 Lab_03 0.0204 0.00044 20.4 0.0163
## 8 Lab_04 0.0236 0.000372 23.6 0.0233
## 9 Lab_05 0.00569 0.000136 5.69 0.0198
## 10 Lab_06 0.0641 0.00127 64.1 0.0523
## # ℹ 45 more rows
## # ℹ abbreviated name: ¹PE_mg_per_g_sample_mean
## # ℹ 64 more variables: X564_mean <dbl>, X592_mean <dbl>, Xred_mean <dbl>,
## # total_PE_mg_mean <dbl>, PE_mg_per_g_sample_sd <dbl>, PE_mg_per_mL_sd <dbl>,
## # PE_scaled_sd <dbl>, X455_sd <dbl>, X564_sd <dbl>, X592_sd <dbl>,
## # Xred_sd <dbl>, total_PE_mg_sd <dbl>, PE_mg_per_g_sample_se <dbl>,
## # PE_mg_per_mL_se <dbl>, PE_scaled_se <dbl>, X455_se <dbl>, X564_se <dbl>, …
Before <- read.csv("E_rep_analy_before_summary.csv")
E_rep_analy_summary <- E_rep_analy_summary[-1,]
se_change <- mean(E_rep_analy_summary$PE_mg_per_g_sample_se, na.rm = TRUE)- mean(Before$PE_mg_per_g_sample_se, na.rm=T)
paste("sample se for replicates change by:", se_change)
## [1] "sample se for replicates change by: 0.0042111288742201"
Check effects of enhancement algorithm
PErep_enhanced <- read_csv("E_rep_analy_summary.csv") #retrieve analized replicates from enhanced CV
PErep_all <- read_csv("all_rep_analy_summary.csv") #retrieve analized replicates without enhancement
#Note, this is a product of analyze_replicates using enhanced replicate selection.
SE_change <- PErep_enhanced$PE_mg_per_g_sample_max_dev_pct -PErep_all$PE_mg_per_g_sample_max_dev_pct
print(SE_change)
## [1] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [6] 0.0000000 0.0000000 -64.2502175 0.0000000 -91.3614637
## [11] 0.0000000 -222.4021594 -123.2779150 -50.8332025 -56.5204226
## [16] -85.1448923 -50.2539810 -67.2754758 -73.3978154 -52.2745398
## [21] -0.1267003 -55.1960482 0.0000000 -67.6319484 -183.6385190
## [26] -78.6624204 0.0000000 -48.5333937 -43.4997256 0.0000000
## [31] 0.0000000 0.0000000 0.0000000 -17.8632699 -104.9177229
## [36] -45.7234158 -72.5501845 -56.8479013 -16.8992903 0.0000000
## [41] -56.0968444 -71.7873837 -66.0218326 -49.2541002 -82.4368101
## [46] -24.2029729 0.0000000 0.0000000 0.0000000 0.0000000
## [51] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [56] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [61] 0.0000000 0.0000000 NaN
SE_change <- SE_change[-51] #remove one NaN at the end
paste("# number of replicates SE acted upon:", length(SE_change[SE_change != 0]))
## [1] "# number of replicates SE acted upon: 31"
paste("average improvement by max deviation as a percent of the mean:", abs(mean(SE_change[SE_change != 0], na.rm = TRUE)))
## [1] "average improvement by max deviation as a percent of the mean: 69.29608564262"
Plain English:
PErep_enhanced) with our sample metadata
(Sample data) so that we can run group‐level comparisons
(e.g., by location, variety, life stage).PE_mg_per_g_sample_mean using
graph_replicates_custom_error().compare_groups() four times to produce boxplots
(with significance letters) for:
# 1. Merge enhanced replicate summary with metadata by column "ID"
# This will save "PErep_final.csv" and assign the merged data frame to `PErep_final`.
# 1. Define export folder and create it if needed
final_export_dir <- file.path("PE", "output_PE", "export data", "PErep_joined")
dir.create(final_export_dir, recursive = TRUE, showWarnings = FALSE)
# 2. Join PErep_enhanced with Sample data
joindf_by_id(
df1 = PErep_enhanced,
df2 = `Sample data`,
output_name = file.path(final_export_dir, "PErep_final.csv"),
unmatched_out = file.path(final_export_dir, "PErep_unmatched.csv"),
key_df1 = "ID",
key_df2 = "ID"
)
##
## Values needing repetition:
## Fresh_1 needs to be repeated
## Fresh_2 needs to be repeated
## Fresh_3 needs to be repeated
## Ilo oven dry R02 needs to be repeated
## juice fresh R2 needs to be repeated
## juice fresh R2 needs to be repeated
## juice ilo needs to be repeated
## Lab_72 needs to be repeated
## Lab_73 needs to be repeated
## Lab_74 needs to be repeated
## NA needs to be repeated
## $result_df
## # A tibble: 62 × 78
## join_id `Sample Code` Location variety Ind_ID Life_S `Large Ind?`
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Lab_01 R01 Mendieta C.Cham I45 Tetra Yes
## 2 Lab_02 R02 Mendieta C.Cham I38 Tetra Yes
## 3 Lab_03 R04 Mendieta C.Cham I26 Carp Yes
## 4 Lab_04 R02 Mendieta C.Cham I37 Carp Yes
## 5 Lab_05 R05 Mendieta C.Cham I47 Gam Yes
## 6 Lab_06 R04 Concession C.Cham I63 Gam Yes
## 7 Lab_07 R01 Concession C.Cham I49 Gam Yes
## 8 Lab_08 R03 Concession C.Cham I65 Gam Yes
## 9 Lab_09 R02 Concession C.Cham many Gam/Tetra No
## 10 Lab_10 R01 Mendieta F. Glom many Gam/Tetra No
## # ℹ 52 more rows
## # ℹ 71 more variables: `Type of analysis` <chr>,
## # `Supplement for Carageenan` <chr>, ...10 <chr>,
## # PE_mg_per_g_sample_mean <dbl>, PE_mg_per_mL_mean <dbl>,
## # PE_scaled_mean <dbl>, X455_mean <dbl>, X564_mean <dbl>, X592_mean <dbl>,
## # Xred_mean <dbl>, total_PE_mg_mean <dbl>, PE_mg_per_g_sample_sd <dbl>,
## # PE_mg_per_mL_sd <dbl>, PE_scaled_sd <dbl>, X455_sd <dbl>, X564_sd <dbl>, …
##
## $merged_cells
## [1] 52
##
## $unmatched_cells
## [1] 11
##
## $unmatched_saved_to
## [1] "PE/output_PE/export data/PErep_joined/PErep_unmatched.csv"
##
## $assigned_to_global
## [1] TRUE
# 3. Read in the joined summary
PErep_final <- readr::read_csv(file.path(final_export_dir, "PErep_final.csv"))
# 4. Define output directory for replicate analysis plots
rep_plot_dir <- file.path("PE", "output_PE", "plots", "replicate_analysis")
dir.create(rep_plot_dir, recursive = TRUE, showWarnings = FALSE)
# 5. Generate histogram with error bars for PE_mg_per_g_sample_mean
graph_replicates_custom_error(
data = PErep_final,
id_col = "join_id",
value_col = "PE_mg_per_g_sample_mean",
se_col = "PE_mg_per_g_sample_se",
output_prefix = file.path(rep_plot_dir, "E_rep_analy")
)
# 6. Run group comparisons and print outputs
compare_groups(
data = PErep_final,
response_var = "PE_mg_per_g_sample_mean",
group_var = "Location"
)
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 13 0.07250 0.005577 3.96 0.000448 ***
## Residuals 38 0.05352 0.001408
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 10 observations deleted due to missingness
##
## Kruskal-Wallis rank sum test
##
## data: PE_mg_per_g_sample_mean by Location
## Kruskal-Wallis chi-squared = 23.153, df = 13, p-value = 0.03988
compare_groups(
data = PErep_final,
response_var = "Xred_mean",
group_var = "Location"
)
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 13 4.060e+09 312307207 3.786 0.000661 ***
## Residuals 38 3.134e+09 82483424
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 10 observations deleted due to missingness
##
## Kruskal-Wallis rank sum test
##
## data: Xred_mean by Location
## Kruskal-Wallis chi-squared = 14.081, df = 13, p-value = 0.3682
compare_groups(
data = PErep_final,
response_var = "PE_mg_per_g_sample_mean",
group_var = "variety"
)
##
## Welch Two Sample t-test
##
## data: PE_mg_per_g_sample_mean by variety
## t = 2.6909, df = 36.829, p-value = 0.01065
## alternative hypothesis: true difference in means between group C.Cham and group F. Glom is not equal to 0
## 95 percent confidence interval:
## 0.007616961 0.054080989
## sample estimates:
## mean in group C.Cham mean in group F. Glom
## 0.06228289 0.03143392
##
## Wilcoxon rank sum exact test
##
## data: PE_mg_per_g_sample_mean by variety
## W = 338, p-value = 0.03294
## alternative hypothesis: true location shift is not equal to 0
compare_groups(
data = PErep_final,
response_var = "PE_mg_per_g_sample_mean",
group_var = "Life_S"
)
## Df Sum Sq Mean Sq F value Pr(>F)
## Life_S 4 0.00596 0.001489 0.565 0.689
## Residuals 43 0.11335 0.002636
## 14 observations deleted due to missingness
##
## Kruskal-Wallis rank sum test
##
## data: PE_mg_per_g_sample_mean by Life_S
## Kruskal-Wallis chi-squared = 4.1177, df = 4, p-value = 0.3903
After the join, PErep_final includes both replicate‐analysis summary columns (means, SE, CV, etc.) and metadata columns (Location, variety, Life_S).
The histogram with error bars (graph_replicates_custom_error) shows the distribution of PE_mg_per_g_sample_mean across all samples, with standard error bars.
The four boxplots (compare_groups) reveal:
PE_mg_per_g_sample_mean differs significantly across Location (e.g., Location A > Location B > Location C).
Xred_mean also varies by Location.
PE_mg_per_g_sample_mean shows significant differences among varieties (e.g., Variety X > Variety Y).
PE_mg_per_g_sample_mean differs by life stage (e.g., Juvenile vs. Mature).
PErep_final to only rows where
variety == "Red" (replace "Red" with your
actual variety).compare_groups() to compare:
Subset PErep_final to only rows where Location == “LocationA” (replace “LocationA” with your actual location).
Then run compare_groups() to compare:
PE_mg_per_g_sample_mean by variety
Xred_mean by variety
# 1. Create export folder for "Red" variety comparisons
red_comp_dir <- file.path("PE", "output_PE", "export data", "PErep_comparisons_red")
dir.create(red_comp_dir, recursive = TRUE, showWarnings = FALSE)
# 2. Subset data for life stage == "gam/Tetra"
df_var_gamtetra <- PErep_final %>%
filter(Life_S %in% c("Gam/Tetra", "Gam", "Tetra"))
# 3. Compare PE_mg_per_g_sample_mean across locations
res_pe_location <- compare_groups(
data = df_var_gamtetra,
response_var = "PE_mg_per_g_sample_mean",
group_var = "Location"
)
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 10 0.06686 0.006686 8.998 1.04e-05 ***
## Residuals 22 0.01635 0.000743
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 5 observations deleted due to missingness
##
## Kruskal-Wallis rank sum test
##
## data: PE_mg_per_g_sample_mean by Location
## Kruskal-Wallis chi-squared = 19.218, df = 10, p-value = 0.03758
# Save the result as CSV (optional)
#write.csv(res_pe_location, file = file.path(red_comp_dir, "PE_by_location.csv"), row.names = FALSE)
# 4. Compare Xred_mean across locations
res_xred_location <- compare_groups(
data = df_var_gamtetra,
response_var = "Xred_mean",
group_var = "Location"
)
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 10 3.504e+09 350425392 5.657 0.000347 ***
## Residuals 22 1.363e+09 61940902
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 5 observations deleted due to missingness
##
## Kruskal-Wallis rank sum test
##
## data: Xred_mean by Location
## Kruskal-Wallis chi-squared = 10.993, df = 10, p-value = 0.3581
# Save the result as CSV
#write.csv(res_xred_location, file = file.path(red_comp_dir, "Xred_by_location.csv"), row.names = FALSE)